[NVBug 5996622] skip syncing for Llama4 experts by shengliangxu · Pull Request #1136 · NVIDIA/Model-Optimizer

shengliangxu · 2026-03-30T06:38:01Z

What does this PR do?

Type of change: But fix

Llama4 experts are already fused

Testing

Rerun NVBug 5996622 QA test.

Summary by CodeRabbit

Bug Fixes
- Improved reliability of quantization synchronization for expert modules with shared quantizers, preventing unnecessary synchronization attempts.
Documentation
- Updated documentation to clarify quantization behavior for fused expert module configurations.

Llama4 experts are already fused Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

coderabbitai · 2026-03-30T06:38:16Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e7a90207-8181-4774-a592-a47cb024c175

📥 Commits

Reviewing files that changed from the base of the PR and between a3f5c46 and 6463b5e.

📒 Files selected for processing (1)

modelopt/torch/quantization/plugins/huggingface.py

📝 Walkthrough

Walkthrough

Updated _QuantSparseMoe.layer_sync_moe_local_experts_amax method to add an additional guard that prevents amax synchronization for non-iterable (fused) experts modules. The method now attempts iteration and returns early on TypeError, ensuring sync_moe_expert_amax is only invoked for iterable experts.

Changes

Cohort / File(s)	Summary
Fused Experts Guard `modelopt/torch/quantization/plugins/huggingface.py`	Added control flow guard using `iter(self.experts)` to skip `sync_moe_expert_amax` invocation for non-iterable (fused) experts modules. Expanded docstring to document synchronization behavior for fused experts with shared quantizers.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[NVBug 5996622] skip syncing for Llama4 experts' directly reflects the main change: skipping amax synchronization for Llama4 (fused) experts modules.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns	✅ Passed	The PR introduces a minor control flow modification adding a try-except block to safely check if self.experts is iterable, addressing fused experts modules without introducing security anti-patterns.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch shengliangx/nvbug-5996622

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-30T06:44:41Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-03-30 22:11 UTC

codecov · 2026-03-30T06:53:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.21%. Comparing base (f04e106) to head (189f236).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1136      +/-   ##
==========================================
+ Coverage   70.19%   70.21%   +0.02%     
==========================================
  Files         230      230              
  Lines       26073    26073              
==========================================
+ Hits        18302    18308       +6     
+ Misses       7771     7765       -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

meenchen

Is this a regression? wondering how we didn't find this issue in the previous releases.

shengliangxu · 2026-03-30T22:10:17Z

Is this a regression? wondering how we didn't find this issue in the previous releases.

It's some new code logic for syncing local experts. That's why we need Modeling Lib for per-model logics.

### What does this PR do? Type of change: But fix Llama4 experts are already fused ### Testing Rerun NVBug 5996622 QA test.  ## Summary by CodeRabbit * **Bug Fixes** * Improved reliability of quantization synchronization for expert modules with shared quantizers, preventing unnecessary synchronization attempts. * **Documentation** * Updated documentation to clarify quantization behavior for fused expert module configurations.  Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

[NVBug 5996622] skip syncing for Llama4 experts

6463b5e

Llama4 experts are already fused Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu requested a review from a team as a code owner March 30, 2026 06:38

shengliangxu requested a review from mxinO March 30, 2026 06:38

Merge branch 'main' into shengliangx/nvbug-5996622

2a1c3a2

shengliangxu requested a review from cjluo-nv March 30, 2026 16:46

Merge branch 'main' into shengliangx/nvbug-5996622

9d4277c

kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Mar 30, 2026

shengliangxu requested a review from meenchen March 30, 2026 17:16

Merge branch 'main' into shengliangx/nvbug-5996622

189f236

meenchen approved these changes Mar 30, 2026

View reviewed changes

shengliangxu merged commit 21aa8a9 into main Mar 30, 2026
45 checks passed

shengliangxu deleted the shengliangx/nvbug-5996622 branch March 30, 2026 22:10

kevalmorabia97 removed the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVBug 5996622] skip syncing for Llama4 experts#1136

[NVBug 5996622] skip syncing for Llama4 experts#1136
shengliangxu merged 4 commits into
mainfrom
shengliangx/nvbug-5996622

shengliangxu commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions Bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

meenchen left a comment

Uh oh!

shengliangxu commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shengliangxu commented Mar 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

meenchen left a comment

Choose a reason for hiding this comment

Uh oh!

shengliangxu commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shengliangxu commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

github-actions Bot commented Mar 30, 2026 •

edited

Loading

codecov Bot commented Mar 30, 2026 •

edited

Loading